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Abstract— Automating the transistor and wire-sizing process 
is an important step toward being able to rapidly design high- 
performance, custom circuits. This paper presents a circuit opti- 
mization tool that automates the tuning task by means of state- 
of-the-art nonlinear optimization. It makes use of a fast circuit 
simulator and a general-purpose nonlinear optimization package. 
It includes minimax and power optimization, simultaneous tran- 
sistor and wire tuning, general choices of objective functions and 
constraints, and recovery from nonworking circuits. In addition, 
the tool makes use of designer-friendly interfaces that automate 
the specification of the optimization task, the running of the 
optimizer, and the back-annotation of the results of optimization 
onto the circuit schematic. 

Particularly for large circuits, gradient computation is usually 
the bottleneck in the optimization procedure. In addition to 
traditional adjoint and direct methods, we use a technique called 
the adjoint Lagrangian method, which computes all the gradients 
necessary for one iteration of optimization in a single adjoint 
analysis. 

This paper describes the algorithms and the environment in 
which they are used and presents extensive circuit optimization 
results. A circuit with 6900 transistors, 4128 tunable transistors, 
and 60 independent parameters was optimized in about 108 min 
of CPU time on an IBM Rise/System 6000, model 590. 

Index Terms — Adjoints, circuit tuning, nonlinear optimization, 
simulation, transistor sizing. 

L Introduction, Motivation, and Previous Work 

AUTOMATING the circuit optimization process is an 
important step toward rapidly and robustly designing 
high-performance circuits. Particularly in the use of custom 
designs, manually sizing schematics for area, delay, and power 
is an iterative, slow, tedious, and error-prone approach with 
circuit simulation in the inner loop. The updating of transistor 
widths from one iteration to the next in this context relies on 
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human intuition. However, the quest for optimal performance 
and the importance of short time-to-market make automatic 
circuit tuning increasingly important. In addition, automatic 
tuning has the benefit of facilitating design adaptation and 
reuse. Hence an automatic tuning (and retuning) capability is 
becoming crucial to the productive design of custom circuits. 

In the case of gradient-based dynamic tuning, function and 
gradient values are determined by means of a dynamic (time- 
domain) analysis of the underlying circuit. In the case of 
static optimization, a static timing analysis is relied upon to 
analyze new iterates produced during the optimization process. 
If circuit blocks are modeled by analytic delay equations, 
these equations can be differentiated symbolically to determine 
gradients. Unfortunately, this procedure is not applicable to 
custom designs because of the lack of availability of delay 
models for arbitrary transistor-level circuits. 

In any type of circuit tuning, the computation of gradients is 
often the bottleneck in the optimization procedure. Gradients 
are generally computed by the direct method [1] or the adjoint 
method [2], The direct method can provide the gradients of all 
of the measurements with respect to a single parameter and 
thus requires as many simulations of the associated sensitivity 
circuit as the number of tunable parameters. By contrast, in the 
adjoint method, the gradients of one measurement with respect 
to all parameters can be obtained in a single simulation of the 
adjoint circuit but the simulation must be repeated as many 
times as the number of measurements. 

There have been many attempts to automate . the transistor- 
sizing problem. One of the earliest papers to apply gradient- 
based optimization, albeit to a specific network, was that 
of Hachtel and Rohrer [3]. A later paper [4] suggested a 
more general approach that also was intended for gradient- 
based nonlinear optimization. The latter used adjoint gradient 
computations and exploited sparse matrix methods. One class 
of methods [5], [6] is based on static timing analysis [7]. The 
delay of each cell is available either as a precharacterized 
analytic function of the transistor sizes or as an Elmore delay 
approximation. In particular, if the Elmore delay model [8], [9] 
is used, this overall delay is seen to be a posynomial function 
(a particular algebraic form, see [10]) of the transistor widths, 
and geometric programming techniques apply. By a simple 
mapping of variables, the objective is converted to a convex 
function [5] and hence any minimum of the latter is guaranteed 
to be a global minimum. 

The advantages of static-timing-based methods include ef- 
ficiency, the ability to handle large designs, and the fact that 
they do not require input patterns to carry out the tuning. One 



0278-0070/98$ 10.00 © 1998 IEEE 



of the significant disadvantages with these methods is mat they 
are not applicable to full-custom circuit designs, since static 
timing analyzers usually rely on precharacterized library cells. 
In addition, the accuracy of static timing analysis is limited 
(in our experience, the accuracy is at best about ±25% when 
Elmore delays are used), making it unsuitable as a basis for 
tuning high-performance custom circuits. Finally, static tirning 
analysis suffers from the false path problem, so the optimizer 
may be working hard to tune paths that are either irrelevant or 
can never be sensitized. Recently, power optimization has been 
proposed in this general framework [11]. Power is measured 
by probabilistic methods [12] and then approximated by a 
posynomial function. Further, simultaneous tuning of drivers 
and interconnect has been proposed in [13] and [14]. 

Tuning based on dynamic simulation overcomes many of 
the above limitations of static tuning. The accuracy is as good 
as the simulator employed, false paths are not a problem, 
and the method is applicable to any custom circuitry that 
the simulator can analyze. However, appropriate input patterns 
must be provided by the user. These methods (e.g., [15] and 
[16]) typically run SPICE in the inner loop to optimize such 
circuit performance functions as gain, area, delay, and phase 
margin. However, using SPICE iteratively is computationally 
expensive and significantly limits the size of the circuit that 
can be tuned. 

From an overall design perspective, static and dynamic 
methods complement each other at different stages of the 
methodology, depending on the type of design, and there is 
an important place for automatic tuning in both. In this paper, 
we present a method for tuning custom MOS circuits that uses 
dynamic simulation and gradient-based optimization. Unlike 
previous methods, salient features include minimax and power 
optimization, simultaneous transistor and wife tuning, recovery 
from nonworking circuits that might be encountered during the 
tuning (Section II-A), manufacturability modes, and general 
choices of functions and constraints. We emphasize that the 
ability to compute gradients efficiently is crucial to the success 
of this approach. A prototype implementation of our method, 
JiffyTune, optimizes circuits in about the CPU time of one 
SPICE analysis. 



n. Overview of JiffyTune 

This section provides an overview of the various high-level 
software components of JiffyTune and their interactions, as 
depicted in Fig. 1. The JiffyTune "engine" solves the following 
problem. We are given a circuit schematic, input signals, a 
set of circuit performance requirements, and a list of tunable 
transistors and wires with initial sizes. We wish to determine 
the optimal assignment of widths to tunable transistors and 
wires in order to achieve the requirements. The user interface 
makes it convenient for the user to specify the problem and 
review and accept the results of optimization. The interface 
is based on the schematic representation of the problem to be 
tuned and enables the designer to use the full functionality of 
the tuning tool by essentially "pointing and clicking." Once the 
problem is defined, control is passed to the JiffyTune engine, 
where the administration is handled by the JiffyTune block. 
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Fig. I. High-level view of JiffyTune. 

The tuning process is iterative. At each iteration, the fast circuit 
simulator, Simulation Program for Electronic Circuits and 
Systems (SPECS) [17], [18], provides the current function 
values and, on demand, the corresponding sensitivities to 
LANCELOT [19]-[21]. LANCELOT is a large-scale, general- 
purpose nonlinear optimization package. Subsequendy, the 
optimizer attempts to determine a better point or decides that 
the tuning is completed. In the former case, we return to 
SPECS, while in the latter case we leave the inner block and 
return the interface passing control back to the designer, who 
can choose to back-annotate the results onto the schematic. 

The use of a state-of-the-art optimization technique based on 
gradients, along with efficient simulation, gives the flexibility 
and power to tune larger circuits than have previously been 
tuned dynamically, in addition to the ability to obtain close to 
optimal circuits rapidly. Subsequent sections contain detailed 
descriptions of the individual components. 

A. JiffyTune 

The JiffyTune block in Fig. 1 performs the administrative 
portion of the tuning task. The specification of circuit optimiza- 
tion problems is handled via a control file and a corresponding 
control-file grammar. The control file contains the following 
information. 

1 ) Parameters: This section contains a list of tunable tran- 
sistors and wires, their initial widths, and simple bounds. 
Tunable parameters can be ratioed to (i.e., declared to be 
a fixed multiple of) other tunable parameters, and the user 
interface allows grouping of instances of similar structures so 
that they track each other during tuning. Thus, for example, 
the cells of an n-bit wide multiplexer can be grouped to 
ensure that the cells stay identical through the tuning process, 
consequently lending themselves to a structured, regular lay- 
out. The number of independent parameters is in general less 
than the total number of parameters. This reduction is not 
artificial, but is a result of the design and layout process 
where large circuits are not built from scratch, but rather 
common building blocks are replicated in many areas of the 
same design. Furthermore, the designer has a general grasp of 
how the circuit functions and is in a position to choose the 
independent parameters. This is especially true when there 
are important layout considerations that the designer takes 
into account. Therefore, when the problem is presented to 
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JiffyTune, we assume that the independent parameters have 
been ehosen and variables have been grouped. 

2) Measurements and Functions: A measurement is a 
crossing time, power, or area. In the absence of layout 
information, area is modeled by the sum of the tunable 
parameter widths. Functions consist of any linear combination 
of measurements, so, for example, delays and rise/fall times are 
typically the difference of two crossing times. Each function 
has a weight, a target, and a relation. For example, a relation 
of "less than" implies that this function should be less than the 
target value. Alternately, a function can be "minimized," which 
means that the optimizer will treat it as an objective function. 
Weights can be used to explore various tradeoffs in tuning 
the circuit; these are particularly important when functions of 
different quantities (area, delay, power) are being optimized. 
Any number of functions can be grouped into a rninimax 
function. A rninimax function implies that the largest of some 
number of functions needs to be minimized. For example, the 
statement of the problem might be to minimize the delay of 
the worst of three paths through a combinational logic block. 

3) Controls: This section provides administrative informa- 
tion like the maximum number of iterations, the layout grid 
for rounding transistor widths at the end of optimization, and 
the location of the device model files. 

JiffyTune reads the control file and internally represents 
the problem in a format that is understood by LANCELOT. 
JiffyTune also provides to LANCELOT a callable routine that 
will accept a set of wire and transistor widths, perform a 
SPECS simulation, and return function and gradient values 
in the form required by LANCELOT. Then JiffyTune begins a 
LANCELOT optimization. At each iteration, JiffyTune keeps 
track of the best results so far. One of the main functions of 
the JiffyTune block is to chain rule and combine gradients 
to provide to LANCELOT the gradients of various functions 
with respect to independent variables only. Typically, 25-30 
iterations are required for convergence. The default maximum 
number of iterations in JiffyTune is 50. Because the bulk of 
the cost is in the simulation (over 90% in our benchmark 
problem set), there is a high incentive for making heroic 
efforts to reduce the number of optimization iterations, even 
beyond what optimizers would typically do. For example, 
the recently implemented slack updating method described 
in Section IV-B has led to fewer iterations' being required 
in general. Further, to decrease the number of steps and the 
sensitivity to noisy simulation, LANCELOT is encouraged to 
take larger steps. This could lead to a non working circuit 
(e.g., a circuit in which a measured signal transition does 
not even occur in the simulation), and a recovery mechanism 
is implemented whereby the step size is decreased and the 
simulation rerun. 

B. SPECS 

SPECS is a fast circuit simulator that uses simplified device 
models and event-driven techniques. JiffyTune calls SPECS 
in the inner loop to evaluate the circuit and provide function 
and gradient values. SPECS and its sensitivity computation 
capabilities are described in more detail in Section III. 



C. LANCELOT 

LANCELOT [19]— [21] is a general-purpose large-scale non- 
linear optimization package that handles simple bounds and 
general constraints. JiffyTune provides the problem descrip- 
tion and initial transistor and/or wire sizes to LANCELOT. 
LANCELOT repeatedly calls SPECS with different param- 
eter size settings and builds a model of the "performance 
surface" of the circuit. Details regarding LANCELOT and 
its application to the circuit-tuning problem are provided 
in Section IV. In addition to LANCELOT, the optimization 
package MINOS [22] has been integrated into JiffyTune using 
the tools that accompany the optimization testing environment 
CUTE [23]. MINOS integration has been used only for 
comparisons and as a "sanity check." 

HI. SPECS and Time-Domain Sensitivity Computation 

SPECS is a fast circuit simulation program. On average, 
it is 70 x faster than AS/X, an IBM internal SPICE-like 
circuit analysis program [24]. SPECS achieves this speed by 
using simplified device models and event-driven techniques to 
efficiently simulate MOSFET circuits in the time domain, and 
it has been used in production mode in numerous integrated- 
circuit designs. The device modeling assumptions in SPECS 
restrict its relative timing accuracy to ±5%, and hence Jiffy- 
Tune can only tune to within this accuracy. Although JiffyTune 
uses SPECS to evaluate the circuit being optimized, this paper 
will not describe SPECS in any detail, except for the new 
adjoint Lagrangian computations. The reader is referred to 
[17], [18], and [25] for a more comprehensive explanation. 

A. Sensitivity Computation 

SPECS uses simplified device models that consist of piece- 
wise constant I-V characteristics in multiple dimensions and 
grounded, linear capacitances. These simplifications allow 
efficient, incremental, time-domain sensitivity computation 
[26H28]. Both the adjoint method [2], [25] and the direct 
method [1] have been implemented. In the direct method, 
branch constitutive relations (device characteristics) are di- 
rectly differentiated with respect to the sensitivity parameter 
of interest, and the circuit reflecting these differentiated equa- 
tions, called the sensitivity circuit, is solved to obtain the gra- 
dients. Since SPECS uses piecewise constant device models, 
the sensitivity circuit consists of disconnected capacitances for 
large subintervals of time, with occasional impulses of currents 
flowing between these capacitances at times corresponding to 
events in the nominal simulation. Thus, the solution of the 
sensitivity circuit is extremely efficient. In the direct method, 
the sensitivities of all functions with respect to one parameter 
are computed with a single solution of the sensitivity circuit. 

In the adjoint method, elements are replaced by adjoint 
equivalents based on Tellegen's theorem (see, for example, 
[2] and [25]). Again, in the case of SPECS, the circuit is 
very simple and lends itself to efficient solutions. In this case, 
however, time is run backward in the adjoint circuit, and the 
waveforms of the adjoint circuit are convolved with those of 
the nominal circuit to obtain the required sensitivities. The 
gradients of one function with respect to all parameters are 



1295 



computed in a single solution of the adjoint circuit. Hence, 
when there are sufficiently more parameters than functions 
to justify the overhead of convolution, the adjoint method is 
advantageous. 

Note that once the approximation in the simplified device 
models is accepted, the computation of the functions and 
gradients is exact After the sensitivity circuit is solved in 
either method, gradients are chain-ruled and combined to 
obtain the sensitivity of each function with respect to all 
ramifications of varying the tunable parameter. We remark 
that when the width of a transistor varies, its source and drain 
diffusion capacitance and all the intrinsic MOSFET parasitic 
capacitances change. Consequently, each of these is submitted 
as an internal sensitivity parameter, and then all the gradients 
are postprocessed and combined appropriately. The flavor of 
these computations is captured by the following simplified 
equation: 



df dW eS 
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df 



dCD, 



total 



dW dW en dW dCD tot3i 



dW 
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dCS, 
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0) 



dCStotai dW dCGtotzX 
where / is the sensitivity function of interest, W is the 
transistor width (sensitivity parameter), is the effective 
width, and C<3 to tai, CS tota i, and CDtotai are the sums of the 
capacitance components at the gate, source, and drain nodes, 
respectively. Subsequently, (1) is further expanded in terms of 
the device model parameters. 

SPECS assumes that all tunable wires are modeled by RC 
circuits. To compute gradients with respect to wire parameters, 
all resistors and capacitors that depend on the parameter of 
interest are first identified. The gradients with respect to these 
element values are computed and then chain-ruled with the 
gradients of the element values with respect to the parameters 
of interest. Finally, the results of the chain ruling are summed 
over the RC elements of each wire model. 

Voltage-crossing sensitivities are computed as follows (see, 
for example, [29]). Let vpt be the node voltage that crosses 
the level v cross at time t cross in the nominal circuit. We want 
to find dt CTOSS /dp, where p is the sensitivity parameter. Hence 

(2) 



= VN\t=t cl 



Differentiating 

Across 



dp 



dt 



dp dp 



(3) 



reflecting the fact that the node voltage is a function of time 
and the sensitivity parameter p. Thus 



dt c 



dp 



dp \t=t CTOas 

dt U=t cross 



(4) 



Hence the required sensitivity is the negative of the quotient of 
the sensitivity of the node voltage to the parameter at the time 
of crossing and the slope of the node voltage in the nominal 
circuit at that same instant. This concept is referred to as an 
"event functional" in [29]. 



B. Adjoint Lagrangians 

The number of time-domain gradients computed during a 
typical JiffyTune run is often in the millions. Hence gradient 
computation must be extremely efficient to make this process 
feasible. To address this problem, particularly in the case 
of larger circuits, we introduce the concept of using adjoint 
Lagrangians for efficient gradient computation. As was im- 
plicit in [4] and [2], adjoints can be applied to any composite 
scalar functions. The novelty of the adjoint method proposed 
in this paper is that the merit function under consideration, 
the augmented Lagrangian function, is a nonlinear function of 
these measurements that also involves optimization parame- 
ters such as Lagrange multipliers and the penalty parameter. 
However, because of the close integration of the optimizer and 
the simulator, it is possible to excite the adjoint circuit in such 
a manner as to obtain the gradients of the merit function in a 
single adjoint simulation. 

We first demonstrate the basic idea by means of a simple 
example. Referring to Fig. 2, let us assume that the circuit 
to be optimized has just one input and one output. Consider 
the problem of minimizing d\ subject to d 2 = T, where 
d\ denotes the 50% crossing time of the falling transition 
at the single output of interest, d 2 is the 50% crossing 
time of the rising transition, and T is a constant target 
value. The tunable parameters are xux 2 , ■ • * ,x n . The output 
waveform is shown in Fig. 2(a). Let us further assume that 
corresponding to this circuit the nonlinear optimizer builds an 
augmented Lagrangian merit function [30], [31] of the form 
# = d x + \{d 2 -T)+±(d 2 - T) 2 , where A is the Lagrange 
multiplier corresponding to the constraint and fi is a penalty 
parameter used to weight the quadratic augmentation of the 
Lagrangian. At each iteration, the simulator is required to 
compute di,d 2y dd\ jdxi , and dd 2 jdx{ for all i. The quantities 
di and d 2 are computed by a nominal transient analysis, as 
shown in Fig. 2(a). We will first describe how a measurement- 
at-a-time adjoint analysis method is used to determine the 
sensitivities. Initially, an adjoint circuit is appropriately formed 
and configured as shown in Fig. 2(b). The adjoint circuit is 
excited by a current source with a unit Dirac impulse at time 
corresponding to d\, i.e., 6(t — d\). Time and control are 
reversed during the analysis of the adjoint circuit, and the 
nominal waveforms are convolved with the adjoint waveforms 
to yield 



dv 



= conv(xi), Vz 



(5) 



where v is the output signal of interest and conv(:r;) represents 
the convolution between the appropriate nominal and adjoint 
waveforms for each problem variable. The required gradients 
are then computed using the formula 



ddi 
dxi 



dv 

dxi 
dv 
dt 



t=di 



dv I 

dxj \ t~dj 

v\t=d, 



-conv(^) 



(6) 



where v\t=d t is me slope of the nominal voltage waveform 
at time d\. Next, the adjoint analysis is repeated as shown 
in Fig. 2(c) to similarly determine the gradients dd-ifdxi for 



1296 



IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL, 17, NO. 12, DECEMBER 1998 



X V X 2' •**» X n 



(a) Nominal simulation 




(b) Adjoint analysis #1 



(c) Adjoint analysis #2 h l 



x v x 2 , -.;X n 



(d) Adjoint Lagrangian analysis 
Fig. 2. Demonstration of the adjoint Lagrangian formulation by means of an example. 




all L Finally, the gradients are assembled by the optimizer as 
follows: 



g$ _ ddi 



A | (d 2 -T)ldd 2 
fj. \dx t ' 



(7) 



The method described above requires two adjoint analyses 
and two sets of convolution integrals. Instead, our adjoint 
Lagrangian method recognizes that if one assumes that the 
coefficients of dd\/dxi and dc^/dx* are known, from (7), the 
gradients of the merit function can be considered as a linear 
combination of the gradients of the circuit measurements since 
one could rewrite it as 



dxrdxi [ll + 22] - 



(8) 



However, although in general nonlinear, the coefficients hi 
and /12 can be treated as constants once the nominal simulation 
has been completed. Thus two impulses of heights 

-1 



fci = 



and 



h 2 = 



(cfa " T) 



(9) 



(10) 



are applied during a single adjoint analysis. Note that the 
heights of the impulses are functions of the nominal simulation 
results (v\t=rd v v\ t== d 2 and d 2 ) and the parameters A and /x 



from the optimization. The analysis and convolution integrals 
are carried out as before to obtain 



dxi 



= conv(xi), Vi. 



(11) 



t=d l 



Thus, two adjoint analyses have been replaced by one by 
taking advantage of the fact that the optimizer needs only 
the gradient of $ rather than the gradients of the individual 
components of Hence, a single adjoint analysis is enough to 
determine the gradients of $ with respect to all the variables 
of the problem, as shown in Fig. 2(d). 

In general, if an optimization problem involves m measure- 
ments, the adjoint Lagrangian method will obtain a speedup 
of 0{m) over a measurement-at-a-time adjoint analysis. Al- 
though the gradients of the composite merit function can be 
computed by this method, the gradients of the individual 
measurements are not computed and cannot be recovered. 
As we shall see below (in Sections EQ-D and IV -C), this 
limitation introduces some complications. The next section 
will describe in detail how the gradients of any differentiable 
scalar merit function of any number of circuit measurements 
can be computed with respect to any number of parameters by 
means of a single adjoint analysis. 

C. Adjoint Lagrangian Theory 

This section will describe gradient computation by the 
adjoint Lagrangian method. The basic principle is as follows. 
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The gradients of merit functions fi and fa can be computed 
by exciting the adjoint circuit with impulses p\ and P2, 
respectively. It turns out that the gradient of f\ + f<i can be 
computed by exciting the adjoint circuit with both impulses 
Pi and p 2 - By considering the merit function as a linear 
combination of several simple functions, we can calculate 
the gradient of the merit functions by exciting the adjoint 
circuit with several impulses together. This is more than just 
linear superposition, as the impulses are applied to a nonlinear 
adjoint circuit. 

The ensuing mathematical derivation will show that the 
gradients of an arbitrary differentiable merit function of sev- 
eral measurements can be computed by means of a single 
adjoint analysis. Scaled versions of the appropriate individual 
excitations for measurement-at-a-time adjoint analysis are si- 
multaneously applied to the time-varying adjoint circuit, and 
gradients are computed by means of convolution integrals. 
Suppose the merit function of interest is 



* = 9{fu /2, • • • , /F, Ci , C 2 , - * , C C ) 



(12) 



where g is any differentiable function, fj are objective func- 
tions (one in the case of single-criterion optimization, several 
in the case of multicriteria optimization), and cj are con- 
straints. Further, let these objective functions and constraints 
be defined as differentiable functions 

fj = /i(™liW»2i""-,roA/)i j = 1,---,F 

Cj = c J (m 1 ,m2r--^m M ): J = 1, — ,C 

of the circuit measurements m k . We are required to find 
the gradient of <3> with respect to all the xi,i = 1,2, • * • , n 
parameters of the optimization. Now 

d$ _ 9g dfj y> dg dcj 



(13) 



dx{ 4— f d fj dxi dcj dx{ 

7=1 J j=l 

= V s ( dg V d f j dmk \ 



Rearranging the summations 



Vi. 



Eamfc 
Ox 



j=i J j=i 



(14) 



Vi. 



(15) 



All the terms inside the square brackets of the right-hand 
side of (15) are known once the nominal simulation has been 
completed, since the form of g, the forms of fj and Cj, and the 
nominal values of m k , the measurements, are known. Note that 
the terms within the square brackets can depend on the nominal 
simulation results, as well as optimization parameters such as 
slack variables, Lagrange multipliers, penalty parameters, etc. 
Now (15) can be rewritten as 



since h k can be treated as constants after the nominal simula- 
tion. We have reduced the problem of finding the gradients of 
the merit function to that of finding the gradients of a linear 
combination of measurements, provided the corresponding 
coefficients are known, and the functions g, fj, and Cj are 
differentiable. 

We will demonstrate how to pick adjoint circuit excitations 
so that the right-hand side of (16) can be computed effi- 
ciently. Let each measurement be expressible as a time-domain 
convolution integral of the form 



= / {«/, 



iv}Pk(r)dr 



(17) 



where V[ are voltages of independent current sources, iy 
are the currents of independent voltage sources, to is the 
start time of the transient simulation, ^ is the end time, 
{vj^iy} denotes one of vj and iy, and p k is a time-domain 
function that will be used as the excitation in the adjoint 
circuit at the measurement point. Without loss of generality, 
all measurements can be written in terms of the voltages 
of independent current sources and currents of independent 
voltage sources, since a zero-valued current (voltage) source 
can always be added in parallel (series) with the voltage 
(current) to be measured. For example, a measurement that 
is a voltage value of node k at any time t is expressed 
as m k == J t * y vjk{r)8(r - t) dr so that p h is a unit Dirac 
impulse at time corresponding to £. A measurement that is 
a crossing time (as in the example of Section IH-B) requires 
Pfc(r) = (~8(t - *cross))/(^!t=t cr08s ). A power measurement 
that integrates the current through a voltage source from 
time £ start to t en d is easily expressed with p k (r) = u(r — 
istart) — u(r — £ en d)> where u(t) is the unit step function. 
The Elmore delay of a signal (see, for example, [28]) can also 
be expressed as an integral. Expression of each measurement 
as a convolution integral is essential to the computation of 
sensitivities by the adjoint method. 

Following the steps of the derivation in [2], Tellegen's 
theorem [2] is invoked on the nominal circuit and an adjoint 
circuit with the same topology but arbitrary elements. Then 
Tellegen's theorem is again invoked on the perturbed circuit 
and the adjoint circuit. The difference between the two sets 
of resulting equations is integrated over the time period of 
simulation. Time is run backward, and appropriate choices of 
branch constitutive relations are made in the adjoint circuit to 
yield the generalized expression 

I ' (Svjtridt + Y, [ f (fcvvv)dt 

j J y •'to 

= ^cMSvdto) - 6C £ 1 v c vc dt 
i R i R dt+ 

n 0 other elements 



(18) 



In writing (18), adjoint circuit quantities are represented with 
a caret (") symbol, and 6 is used to represent perturbations in 
circuit values (not to be confused with the use of 8 for time- 
domain Dirac functions). The terms on the right-hand side have 
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been shown only for the linear resistors and capacitors in the 
circuit, but the equation is valid only when summed over all the 
elements of the circuit. Note that £/, the current of independent 
current sources in the adjoint circuit, and t)v, the voltage 
of independent voltage sources in the adjoint circuit, are the 
adjoint excitations that must be chosen in order to express the 
sensitivity function of interest. In general, by proper choice of 
adjoint circuit elements, (18) can be summarized as 

^2 (-6vjij)dt + J2 (6ivvv)dt = ^2/3i6xi 
h v 3 J** i 

(19) 

where 6xi represents the variation in the sensitivity parameters 
and ffi represents the corresponding convolutions. Now we will 
manipulate the left-hand side of (19) to the form J2kLi h k Sm k 
by appropriate choices of the ij and v v waveforms, as shown 
below 

For a voltage measurement 

£/( T ) = -*( r Across) 

For a crossing time measurement 

- / \ — Across) 

For a power measurement 

V V {j) = U(r - t s tart) ~ V>(t ~ t end ). (20) 

Equation (20) gives us the current and voltage excitations 
that would be applied one at a time if we were interested 
in finding the sensitivities of the individual measurements. 
Instead, we weight each of the waveforms of (20) by the 
corresponding h k . Then, substituting (20) into (19), we obtain 

M 

ff« = 5^Mm fc = 53ftfe,. (21) 
k-i i 

Taking (21) to the limit as Sxi — » 0, the required sensitivities 
can be picked off as J-j^- = 0i in the course of a single adjoint 
analysis. Note that this result cannot be derived from simple 
superposition, since the adjoint circuit is a time- varying circuit. 

D. Implementation of the Adjoint Lagrangian Formulation 

Some special considerations that were taken into account 
during the implementation of the adjoint Lagrangian formula- 
tion are detailed in this section. 

1 ) Sorting of Pulses and Impulses: Once the nominal simu- 
lation has been completed, the values of the measurements are 
known and the excitations for the adjoint circuit can be created. 
The pulses and impulses of the adjoint circuit are then "quick- 
sorted" in reverse time order to be applied to the adjoint circuit 
in an event-driven manner, after being appropriately scaled. 
Event-driven techniques are used both to analyze the adjoint 
circuit and to carry out the necessary convolutions. In SPECS, 
the convolutions are typically between piecewise constant and 
either piecewise constant or piecewise linear waveforms (see 
[26]-[28]). To speed up the adjoint analysis, the time origin 
is shifted to the time of the first externally applied excitation 
to the adjoint circuit. 



2) Multiple Adjoint Lagrangian Groups: In JiffyTune, typ^ 
ically a constraint or objective function includes a linear 
combination of observable circuit measurements. For example, 
the rise time of a transition can be expressed as the difference 
between the 10 and 90% crossing times of a node voltage. 
Thus, a single objective function or constraint of the opti- 
mization problem typically consists of a "group" of individual 
measurements. "Group adjoint" gradient computation is a 
procedure in which each such group of measurements is treated 
by LANCELOT as a scalar function, whose gradients can be 
computed by a single adjoint analysis. SPECS allows for 
multiple such "adjoint Lagrangian groups." The advantage is 
that the adjoint analysis is performed only as many times as 
the number of groups, rather than as many times as the number 
of individual measurements. As we will see below, multiple 
groups are necessary for computations of the Hessian, the 
matrix of second partial derivatives. 

3) Scaling of the Adjoint Circuit Excitations: Once the tran- 
sient simulation has been completed in a new optimization 
iteration, the measurement values are fed to JiffyTune, while 
SPECS stands by for gradient computation. SPECS com- 
municates to JiffyTune via a semaphore scheme, LANCELOT 
uses the measurement values to update its merit function 
and decides whether to accept the proposed step. If the step 
is rejected, gradient computation is skipped. If the step is 
accepted, LANCELOT provides to JiffyTune the values of the 
slack variables, Lagrange multipliers, penalty parameter, and 
scale factors. As we have already seen, JiffyTune uses these 
values to determine the scaling of each pulse or impulse in 
the form 

M 

rtkirrii (22) 

where h k are the scale factors to be applied to each measure- 
ment and l k and n ki are elements of a vector and sparse matrix 
that are created as a result of the chain ruling described in 
(14H16). 

IV. Nonlinear Optimization in JhtyTune 

In the introduction, we discussed various techniques used in 
circuit tuning, including geometric programming [10]. It is per- 
haps worth pointing out at this point that from the point of view 
of mathematical optimizers, geometric programming is an 
old technique whose development preceded the considerable 
advances in the field of nonlinear programming. Because of its 
inherent inflexibility, it is rarely the most appropriate method. 
Furthermore, most circuit tuning problems as stated are not 
geometric problems. One can choose to transform the original 
problem into a geometric prograrnming problem, but at the 
cost of (possibly considerable) inaccuracy and inflexibility. 

Even for moderately sized circuit optimization problems, it 
is inappropriate to use genetic algorithms, simulated annealing, 
Nelder-Meade pattern search [33], or other heuristic methods 
when first derivatives are available. The optimization engine 
of JiffyTune is based upon the large-scale gradient-based 
nonlinear prograrnming package LANCELOT. We give below 
a brief description of LANCELOT. 
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Fig. 3. Illustration of the generalized Cauchy point. 



A. LANCELOT 

The optimization engine of JiffyTune is based upon the 
large-scale nonlinear programming package LANCELOT. The 
kernel algorithm is an adaptation of a trust-region method to 
the general nonlinear optimization problem subject to simple 
bounds. The method is extended to accommodate general con- 
straints by using an augmented Lagrangian formulation, and 
the bounds are handled directly and explicitly via projections 
that are easy to compute. In the context of unconstrained 
optimization, trust-region methods, combining an intuitive 
framework with a powerful and elegant theoretical foundation, 
have led to robust numerical implementations. An excellent 
reference is [34]. The extension to problems with simple 
bounds is relatively straightforward and is illustrated in Fig. 3 
for a quadratic model function and an Zoo trust region. 

Essentially, global convergence can be guaranteed, provided 
one does at least as well as the generalized Cauchy point 
(w in the figure) that corresponds to the minimum along the 
projected gradient path (x k ^ u —> w) within the trust region, 
where the projection is with respect to the bounds (those either 
provided by the user or implicit in the trust region). If a 
variable, as determined by the generalized Cauchy point, is 
at a bound, it is said to be an activity. Unbounded variables 
are free. Activities are fixed temporarily, thus reducing the 
dimensionality of the search space (from two to one in the 
figure). Then, using only the free variables, the model of the 
objective function is further minimized within the feasible 
region and within the trust region (w is optimal in Fig. 3). 
Thus, one obtains satisfactory asymptotic convergence. The 
function is evaluated at this point to determine how well the 
model predicted the actual change in the objective function. 
If good agreement is obtained, the approximate mimmizer 
is accepted as the next iterate (x k+1 <— x k + s k ) and 
the trust region is expanded. If only moderate agreement is 
obtained, the trust-region size remains unchanged, but the step 
is accepted. Otherwise, no new point is accepted and the trust 
region is contracted. The beauty of such an approach is that, 
when the trust region is small enough and the problem smooth, 
the approximation is necessarily good provided the gradients 
are sufficiently accurate. The above procedure has been proved 
to correctly solve the optimization problem under certain mild 
assumptions. Details are given in [19]. 

The extension to handle equality constraints is carried out by 
means of an augmented Lagrangian function that is minimized 



subject to the explicit bounds, using the earlier algorithm. 
Inequality constraints are converted to equality constraints 
by first introducing slack or surplus variables. We test for 
convergence by determining if we are sufficiently stationary 
(the projected gradient with respect to the simple bounds of the 
augmented Lagrangian is sufficiently small) and sufficiently 
feasible (the norm of the constraint violations is sufficiently 
small). Otherwise, we use the minimization algorithm to 
find an approximate stationary point subject to the simple 
bounds. If this point is sufficiently feasible, we update the 
associated multipliers of the augmented Lagrangian function 
and decrease the tolerances for stationarity and feasibility. 
Otherwise, we give more weight to feasibility by decreasing 
the penalty parameter and resetting tolerances for stationarity 
and feasibility. It is possible to show, under suitable conditions, 
that convergence to a first-order stationary point for the 
nonlinear programming problem is attained. Further, if there 
is a single limit point, eventually the penalty parameter is not 
reduced. Details of these and other theoretical properties are 
given in [20] and [35]. 

The symmetric linear systems of equations that arise during 
the optimization are iteratively solved by using a precon- 
ditioned conjugate gradient method. A good description of 
conjugate methods is given in [36], Sections IV-H3 and IV- 
H5. The LANCELOT package offers several preconditioners, 
and the Schnabel-Eskow preconditioner [37] is used in Jiffy- 
Tune. A detailed reference on the LANCELOT package, 
including all the available options, is given in [21], which 
accompanies the original software. 

B. Application of LANCELOT to JiffyTune 

In the context of JiffyTune, it was necessary to make certain 
modifications to LANCELOT to account for the fact that the 
function and gradient values from SPECS, although accurate 
to within small perturbations, are noisy. The introduced errors 
are small but significantly larger than machine precision. 
Because of the complexity of general nonlinear optimization, 
many initializations (such as the choice of the initial trust- 
region radius or quadratic model) are based upon intelligent 
guesses, which cannot, of course, be ideal in all circumstances. 
In the worst case, for functions without noise, unfortunate 
choices can result in inefficiencies, but in the noisy case, they 
can be insurmountable. For example, if the choice of initial 
trust-region radius is 0.001 /zm, and a step of that size is 
well into the noise region of the problem, then the optimizer 
may never recover. For similar reasons, we had to introduce 
looser tolerances for feasibility, line search discontinuities, and 
bound activities, replacing tolerances in the original software 
that were selected on the basis of machine precision. Finally, in 
order to stop gracefully and predictably, we needed to consider 
step sizes beneath which further progress is unlikely and relate 
stopping criteria to this step size in a robust and consistent 
manner. 

Besides the ramifications of the adjoint Lagrangian im- 
plementation, two other LANCELOT enhancements deserve 
special mention. First, slack or surplus variables corresponding 
to satisfied inequalities are updated at each iteration so that 
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the corresponding equality is satisfied exactly. In addition, 
we make the following observation: suppose that all the 
variables other than slack variables are temporarily fixed 
The remaining augmented Lagrangian function with respect 
to the slack variables is a quadratic function and so can be 
optimized analytically. More important, in our context, such 
optimization can be carried out without any further simulation. 
Whenever such an update is consistent with the resulting 
modified convergence theory, the result has been a reduction 
in the number of iterations to convergence. Details are given 
in [38], where they are called "two-step updates" (see also the 
spacer steps of [39]). 

Second, minimax optimization is handled by the introduc- 
tion of an additional linear variable and the reformulation of 
the problem as a general nonlinear programming problem. For 
example, suppose one had the problem 

minimize maximum f / \ n ^ 

xe$ n ieM = {iX -'\rn} tm ' {ZV 

This problem can be reformulated as 
minimize z 

subject to z - fi(x) > 0, i € M = {1,2, ♦ • - ,rn}. 

(24) 

Note that, as for slack/surplus variables, the two-step updat- 
ing scheme mentioned above can also be exploited for the 
introduced minimax variable z, 

JiffyTune also provides for design for manufacturability 
(DFM). A process corner consists of a set of device model 
parameters (e.g., temperature, power supply voltage, transistor 
length and width, bias) corresponding to a sample point of the 
process space created by inherent variations in manufacturing. 
The sampling is done so as to capture the extremes in the 
distribution of circuit behavior. For example, the parameter 
sets corresponding to typical, best case, and worst case process 
corners are provided. We consider the following abstraction of 
the DFM problem: find the best assignment of transistor (and 
wire) widths so that the tuning problem is optimally solved 
across several process comers. 

JiffyTune includes two DFM modes. The less sophisticated 
of these is to carry out a noininal tuning as usual, but in 
addition simulate the final tuned circuit once at each process 
corner supplied by the circuit designer. All measurements and 
functions are computed for each of these process corners. They 
are printed in a summary table so that the designer is able 
to understand the statistical variation of the performance of 
the circuit. In the more sophisticated DFM mode, JiffyTune 
simultaneously tunes at all process corners so that there are no 
posttuning surprises. At each iteration, the circuit is simulated 
at each of the process corners provided, including computation 
of the gradients. This analysis takes k times as long if k process 
corners are to be considered. The optimizer simultaneously 
optimizes the circuit at all specified process corners so as to 
satisfy constraints and minimize objective functions across all 
corners. For example, an equality or inequality constraint is 
simply replicated at each process comer; an objective function 
is converted to a minimax function, so that the worst function 



across all the process corners is minimized; and a minimax 
function is converted to a more general minimax function as 
follows: if the nominal problem is to minimize the worst of 
t functions, then the worst of tk functions is minimized in 
this mode, where k is the number of process corners being 
simultaneously considered. 

C. Hessian Computations in the Adjoint 
Lagrangian Formulation 

In the nonadjoint Lagrangian formulation, the Hessians (ma- 
trices of second partial derivatives) of the objective function 
and each constraint are approximated using low-rank updates. 
This method requires the gradients of the objective function 
and of the constraints, which are not available in the adjoint 
Lagrangian formulation. Consequently, Hessian computations 
in the adjoint Lagrangian formulation need to be treated 
differently. 

. 1) Hybrid Scheme for the Computation of the Hessian in 
the Adjoint Lagrangian Formulation: LANCELOT uses a 
quadratic model of the merit function at each iteration. Its 
Hessian matrix is built up by low-rank quasi-Newton up- 
date methods. For a quadratic function f (x), the equation 
V 2 f(x)d = Vf(x + d) - V f(x) is satisfied exactly for any di- 
rection d. Analogously, the quasi-Newton condition maintains 
an approximate Hessian, B that satisfies Bd = 7, where 7 
is the appropriate gradient difference. The idea of a low-rank 
update (changing B by a matrix whose columns span a space 
of low dimension) is to modify B with new gradient difference 
(or approximate curvature) information at moderate cost while 
maintaining the quasi-Newton condition. Low-rank updates are 
desirable since they make the accompanying linear algebra 
relatively inexpensive. In "minor iterations" of LANCELOT, 
only the problem variables and slacks change. In "major" 
iterations, usually terminated by sufficient stationarity, either 
the penalty parameter or the Lagrange multipliers change, 
depending on whether or not sufficient feasibility has been 
achieved during the inner unconstrained problem. The merit 
function that LANCELOT builds is 

*(*) = /(*) + + ± £ c?(x) (25) 

i i 

where / is the objective function and Ci are the constraints. 
Hence 

V*(x) = Vf{x) + J2 A * Vc *(*) + ~ c x {x)Vc x {x) (26) 

M i ■ 

and the Hessian 

V 2 *(:r) = V 2 /(x) + ^A,V 2 c,(x) 

i 

+ (ci(x)V 2 a(x) + V C ,(x)Vc,(x) T ). (27) 

^ i 

As an initial approximation of V 2 $(:r) at the outset of the 
optimization, V 2 / and V 2 q are taken to be zero matrices 
(except for the slack/surplus variables or the minrnax z vari- 
ables. See Section IV-C2.) and Vc»(x) Vc t (x) T is computed 
explicitly, At each minor iteration, and at the start of each 
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major iteration where A is changed, low-rank quasi-Newton 
updates are used on the approximate composite Hessian matrix 
B k . However, every time a major iteration begins for which 
fi is changed, we take as our approximate Hessian B k +\ = 
B k + (l/(/x new ) - l/(^oid))St( v ^W v ^(^) T )- Hence, at 
the start of the optimization and at the start of each major 
iteration for which p, has changed, the gradient vector of each 
individual constraint is required. Since each constraint may 
in turn depend on multiple measurements, multiple adjoint 
Lagrangian groups are used in SPECS. Thus, a hybrid overall 
scheme is employed wherein adjoint Lagrangian gradient 
computation with just one group is used at every minor 
iteration or major iteration for which sufficient feasibility was 
attained, but an adjoint analysis with as many groups as the 
number of constraints is used at the start of each major iteration 
otherwise. Although the latter is more expensive, the reduction 
in iterations due to starting the major iteration with a better 
Hessian approximation more than compensates for the added 
computational cost. 

2) Hessian Updates with Respect to Slack Variables in the 
Adjoint Lagrangian Formulation: By taking advantage of 
the form in which slack variables occur in the augmented 
Lagrangian, the Hessian entries with respect to slack vari- 
ables can be explicitly computed. However, using explicitly 
computed values violates the quasi-Newton condition [36] 
when standard Hessian update formulas are used. Hence, we 
have developed modified update formulas that both satisfy 
the quasi-Newton condition and allow us to assign explicitly 
computed Hessian entries. 

The formula is demonstrated below for the case of a problem 
with an objective function and just one inequality constraint. 
Then 



$ = /(*) + A(c(x) + s) 4- — (c(x) -f s) 2 
where s is the slack variable. Clearly 



V?* = 



(28) 



(29) 



If (29) is applied after the regular Hessian updates, the quasi- 
Newton condition will be violated. Instead, we introduce the 
rank-two update 

B k +x = B k + -J— [(y k - B k s k )vlE 



v k Es k 

+ Evk(y k - B k skf] 
(y k - B k s k ) T s k T 



(30) 



where at the 2th iteration, B{ is the Hessian approximation, 
yi is the change in V$, Si is the step, and E is a 
diagonal matrix with a zero on the diagonal corresponding 
to each slack variable and one otherwise. This novel 
update formula preserves the quasi-Newton condition. If 
the slack entries are correctly set in Bq, then the update 
formula (30) guarantees that they remain correctly set 
after the update. By choosing v k as one of (y k - B k s k ), 
Sfc, and y fc , we obtain modified symmetric rank-one, 



PoweU-symmetric-Broyden, and Davidon-Fletcher-Powell 
(DFP) Hessian updates, respectively. By adding the term 
-(slEB k s k )w k wl y where w k = E((l/(y$E$ k ))yk - 
(l/(slEBkS k ))Bk$k), to the modified DFP update, we obtain 
the modified Broyden-Fletcher-Goldfarb-Shanno update [36]. 
One can similarly treat the explicitly available Hessian terms 
corresponding to the additional variable z introduced in 
Section IV-B to handle minimax problems. 

V. JiffyTune Interface and Environment 

The JiffyTune engine as reviewed in Section II is driven by 
a textual control file that describes the optimization problem. 
From the inception of the JiffyTune project, it was realized 
that generation of such files should be automated as much 
as possible and that a good human interface and an intuitive 
abstraction of its use and behavior would be crucial to accep- 
tance of the tool by circuit designers. Interfaces were built to 
run the tool from the Cadence [40] and SLED [41] schematic 
design systems. Integrating the tool into such a framework 
capitalizes on the familiarity of the user with the schematic 
design environment and lends a visual and interactive aspect 
to the tool. Many of the complexities are hidden from the 
designer, although care was taken to allow full access to all 
tool functions, if the designer so requires. The basic functions 
of the interface are listed below. 

A. Specification of Tuning Parameters 

Tunable transistors are specified simply by selecting tran- 
sistors or gates on a hierarchical schematic. The tunable 
transistors/gates are visually marked by a flag to indicate 
tunability. Facilities are provided to ratio transistors. Transistor 
ratioing is of particular importance for dynamic logic, for 
example, to maintain a specific ratio between the half-latch 
and evaluate devices in domino-type logic, or to maintain 
transistor ratios in a tapered stack while tuning. In addition, 
similar instances (transistors, gates, or higher level functional 
blocks) can be "grouped" together to ensure that corresponding 
transistors in those blocks track during tuning. 

B. Specification of Measurements and Functions 

Presently, the interface supports delay, transition time 
(slew), area, and power functions. For delay and transition 
times, net selection is done directly on the schematic. Power 
functions are specified by selecting the required voltage 
source, again directly on the schematic. In all cases, the 
user is prompted to provide a relation and target value as 
described in Section II- A. In the schematic environment, with 
no knowledge of layout, area targets are approximated by the 
sum of the widths of the tunable transistors and wires. The 
appropriate linear combination of measurements is written to 
the control file in each case. Minimax functions can be defined 
over any set of existing measurements. 

C. Specification of Controls 

Administrative information such as the maximum number 
of iterations, location of device model files, process corners 
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TABLE I 
Sensitivity Computation Run Time 



Run time 


Direct 
method 


Adjoint 
method 


Adjoint 
Lagrangian 
method 


lotal run time 


2.1.16 


1 O vll 




Run time for sensitivity 

vAJLAipUvtV wVJi_l Uiiij 


19.96 


10,38 

yOv wCafi. J 


0.37 


Run time per sensitivity 
circuit solution 


0.119 


0.288 


0.37 


Run time per sensitivity circuit solution 

as a fraction of simulation time (2.03 seconds) 


5.86% 


14.19% 


18.05% 


Run time per gradient computation 


3.3 x 10~ 3 


1.72 x ur 3 


6.12 x 10~ 5 


Run time per gradient evaluation 
as a fraction of simulation time 


0.16% 


0.085% 


0.003% 



information, desired method for computing the sensitivities, 
and layout grid for rounding transistor widths at the end of 
optimization can be specified in a form that is prefilled with 
project-specific defaults. 

D. Execution of JiffyTune 

After specifying parameters, functions, and controls, the 
designer can ask for all this information to be written to a 
control file. Then the designer can launch the JiffyTune engine, 
whereupon the progress of the optimization is displayed. 

£. Back-Annotation of the Results 

The results of a JiffyTune run are back-annotated onto the 
schematic as suggested transistor widths next to transistors 
(or as new parameters next to gates). The designer can then 
accept these new widths/parameters, selectively or as a whole. 
Further, a facility is provided to back-annotate final waveform 
characteristics, such as delay through a gate or rise time of 
a net, direcdy onto the schematics, relieving the designer of 
the need to browse through simulation data using a waveform 
viewer. 

F. Utilities 

The JiffyTune menu also includes facilities replicated from 
other areas of the schematic design environment, such as 
schematic saving, netlisting, and automatically adjusting the 
number of fingers on each transistor, to create a single inte- 
grated tuning environment. 

Circuit requirements must be specified with care, since the 
optimizer will take advantage of any unspecified aspects. For 
example, area minimization will shrink to its rhiriimum size a 
transistor that does not contribute materially to any measured 
transition. Thus, the tool enforces clear expression of circuit 
requirements that otherwise are often tacit. Since these circuit 
requirements and attributes logically belong with the circuit 
(they are indeed part of the intellectual effort of designing 
the circuit), the tuning parameters and functions are stored in 
the design data base, either as instance properties (tunability, 
upper and lower bounds on transistor widths) or as schematic 
properties (grouping, functions). This practice also encourages 



the reuse of circuits; if a tuning problem has been adequately 
specified, the circuit can easily be retuned. 

VI. Computational Results 

A. Sensitivity Benchmarks 

We report results for the direct method, measurement-at-a- 
time adjoint approach, and adjoint Lagrangian technique as 
implemented in JiffyTune. A dynamic logic "branch scan" 
circuit with 144 MOSFET's, an actual circuit from a high- 
performance PowerPC microprocessor, was chosen to demon- 
strate the efficiency of the gradient computation. The circuit 
was simulated in SPECS for a simulation interval of 27 ns. 

The CPU time for simulation was 2.03 s on an IBM 
Rise/System 6000 model 590. Then the same simulation run 
was carried out with 36 sensitivity functions (crossing times) 
and 104 MOS transistor widths as sensitivity parameters. 
Since there were 64 diffusion and other parasitic capacitances 
dependent on these 104 transistor widths, the total number 
of sensitivity parameters was 168. The number of gradients 
computed in this benchmark was 6048, since SPECS finds 
the gradient of every sensitivity function with respect to 
each sensitivity parameter (our Jacobian matrix is dense). The 
run times of SPECS with all three sensitivity computation 
methods (direct, adjoint, and adjoint Lagrangian) on this 
benchmark circuit are shown in Table I. From the table, we see 
that the total run time for a JiffyTune iteration would be 2.42 s 
(assuming that the adjoint Lagrangian method were used). For 
comparison, the AS/X [24] run time on this circuit (with no 
gradient computation, of course) was 40.11 s. Hence, even on 
this modest example, JiffyTune can complete 1 7 iterations with 
gradient computation in the time it takes AS/X to simulate the 
nominal circuit once. 

As can be seen from the table, the overhead of computing 
one gradient is a fraction of a percent of the original simulation 
time, which works out to 61 /xs or less of CPU time in this 
example. The overhead of one sensitivity circuit analysis is 
about 5.9% for the direct method and about 15% for the 
adjoint method. Note that the number of runs in the adjoint 
method is equal to the number of functions, while it is equal 
to the number of sensitivity parameters in the direct method 



(a) Direct method 



(b) Adjoint method 




(c) Heuristic choice of method (d) Adjoint Lagrangian formulation 

Fig. 4. Run time of gradient computations plotted against number of measurements and parameters. 



and is equal to one in the adjoint Lagrangian method. The 
higher overhead in the adjoint method is accounted for by the 
convolution required between the waveforms of the original 
circuit and the sensitivity circuit. 

Next we tested gradient computation on the same dy- 
namic branch-scan-select circuit while varying the number 
of measurements from 1 to 36 and the number of tunable 
transistors from 1 to 104. Four analyses were conducted for 
each resulting combination. In the first analysis, the direct 
method of sensitivity computation was used. The run time 
of sensitivity computation as a function of the number of 
measurements and parameters is shown in Fig. 4(a). 

As can be seen from the figure, the incremental cost of 
each additional sensitivity parameter is quite high [see the 
bold arrow in Fig. 4(a)], as predicted by the theory. Fig. 4(b) 
shows the run time using the adjoint method, computing the 
gradients of individual measurements. Again, as indicated by 
the bold arrow, the growth of run time with each additional 
measurement is quite high. In Fig. 4(c), the run time of our 
previous production version is shown, in which a heuristic is 
used to pick the sensitivity analysis method. If the parameters 
outnumber the measurements by more than a factor of three, 
the adjoint method is chosen. The figure clearly shows a ridge 
where the program switched from the direct to the adjoint 
method. Finally, Fig. 4(d) shows the run time of the adjoint 
Lagrangian formulation wherein a single adjoint analysis was 
used to compute the gradients of the merit function with 
respect to all parameters. Fig. 4(d) clearly demonstrates not 
only the speedup obtained by the novel method but also the 
relatively slow growth of run time with respect to the number 



of parameters and almost imperceptible growth with respect 
to the number of measurements. 

The adjoint Lagrangian formulation was tested on a set 
of 18 benchmark circuits, whose characteristics are shown 
in Table II. An adjoint sensitivity analysis was performed on 
each of these 18 benchmark circuits to compute the gradients 
of individual measurements. Then the sensitivity analysis was 
repeated with an adjoint Lagrangian formulation, using a set 
of weights to form a linear combination of the measurements, 
as shown in (16). The gradients of the former analysis were 
combined in a postprocessing step, using the same weights, 
to compose the gradients of the composite merit function. 
The two sets of gradients were then compared. Across all the 
benchmarks, a total of 707 677 gradients were compared. The 
worst inaccuracy among all these gradients between regular 
adjoint analysis and adjoint Lagrangian analysis was 2.96 x 
10 -11 (in units of either ns/^tm or mWY/xm), demonstrating 
that the adjoint Lagrangian formulation does indeed produce 
the same results. 

The run times and speedups for gradient computation only 
and for nominal simulation combined with gradient compu- 
tation are shown numerically in Table HI and graphically in 
Fig. 5. 

All CPU times in this paper are on an IBM Rise/System 
6000 model 590 workstation. While Table II shows all mea- 
surements as defined by the user, the number of measurements 
in Table III and Fig. 5 are the number of "meaningful" mea- 
surements, discounting delay measurements on primary inputs 
whose gradients are known to be zero. A speedup of up to 
36 x in gradient computation is observed on circuits with 
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TABLE II 

Characteristics of Benchmark Circuits 



Nam* 


MOSFETS 


Numb«f 
pwMMtm 


NamW 
D»p«ad«>t 


M«**ur*m«ttt* 


Nwobw 
Con*tr»iat« 


Objective 
ftract ion ? 


Muvimax 
probiomT 


lau2 


24 


4 


16 


3 


1 


Y 


N 


morrill 


8 


3 


3 


2 


1 


N 


N 


davies3 


235 


15 


87 


2 


1 


Y 


N 


durham2 


204 


11 


2 


4 


2 


N 


N 


wire 


8 


8 


3 


4 


0 


Y 


N 


NovOl power 


17 


4 


0 


3 


1 


Y 


N 


fleischer 


228 


104 


80 


5 


5 


Y 


Y 


clkgen 


28 


17 


10 


6 


5 


N- 


N 


NovOl 


17 


4 


0 


6 


4 


N 


N 


northropjcor 


15 


9 


2 


16 


8 


Y 


Y 


coulmanjdelay 


70 


16 


48 


33 


17 


N 


N 


coulman_hot 


70 


16 


48 


33 


17 


N 


N 


coulmanjcold 


70 


16 


48 


33 


17 


N 


N 


coulman_delay 
_minmax 


70 


16 


48 


33 


17 


Y 


Y 


coulmanJiot 
_minmax 


70 


16 


48 


33 


17 


Y 


"Y 


coulmanjcold 
-minmax 


70 


16 


48 


33 


17 


Y 


Y 


bultmann 


80 


24 


0 


26 


13 


Y 


Y 


10 mux 


6,900 


60 


4,068 


82 


41 


Y 


Y 



B. Case Study of JiffyTune Use 

JiffyTune was applied to tune custom circuits in the critical 
paths of a high-performance, dynamic-logic PowerPC micro- 
processor. The circuits consisted of a mix of transistors and 
continuously parameterized gates. The Cadence graphical user 
interface made it possible for designers to use the tool with 
little or no training. JiffyTune was used by 58 designers during 
about 2269 interactive sessions to tune 388 unique circuits. 
More than 6289 successful JiffyTune runs were carried out, 
showing that some circuits were retuned multiple times. The 
results of tuning on one particular benchmark circuit using 
the measurement-at-a-time formulation are presented below. 
Table IV lists the results of running JiffyTune on a 12- way 
priority decode circuit under four different conditions. The 
circuit contains 70 MQSFETs, and the simulation was run for 
35 ns. The tuning runs all had 64 tunable transistors, of which 
16 were independent and 48 dependent. The 17 functions to be 
optimized included the rising delay through four critical paths, 
the falling delay through those paths, the rise/fall times on 
each of the above eight transitions, and an area constraint. For 
confidentiality purposes, the delay requirement on the worst 
of the critical paths has been normalized to 500 time units 
for the table on this benchmark. The table lists the rising 
and falling delay of the four paths being tuned as predicted 
by AS/X on the final design (the worst of the eight delays 
for each run is shown in bold), the total tunable transistor 
area of the circuit, and the total CPU time required to run 
JiffyTune on an IBM Rise/System 6000 model 590. The first 
JiffyTune run (HOT) started from a circuit that had previously 
been manually tuned (MANUAL). The worst delay through the 
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Fig. 5. Histogram of speedups for the 18 benchmark circuits. 

a large number of measurements, which in turn leads to 
a speedup of up to 4.2 x per iteration of JiffyTune. Fig. 5 
shows the speedup of simulation combined with gradient 
computation, speedup of just the gradient computation, and 
number of nontrivial measurements in each benchmark. From 
the theoretical discussion of Section III-C, the number of 
measurements is an upper bound on the practically achievable 
speedup. Note that on some of the smaller examples, a 
speedup higher than the theoretically predicted speedup is 
achieved due to the granularity of CPU time measurements. 
We remark that the faster gradient computation renders feasible 
the optimization of much larger circuits than was previously 
possible. 



TABLE m 

Sensitivity Computation and Total Speedups 





MOSFETS 


Number 


NumUr 
p«Min«t«n 


S«oaitivity 
computation 
tim* (CPU Mcondj) 


Total 
run tin* 
(cpu ••«..) 


Total 
Spaedup 

factor 


Adjoint 


Adjoint 


factor 


Adjoint 


LagrusfUe 


Uu2 


24 


1 


20 


0.12 


0.1 


1.2 


1.42 


1.3 


1.1 




8 


1 


6 


0.01 


0.01 


1.0 


0.27 


0.26 


1.0 




235 


1 


102 


0.58 


0.48 


1.2 


12.4 


12.8 


0.97 




204 


2 


13 


0.46 


0.21 


2.2 


5.86 


5.63 


1.0 


wir« 


8 


2 


11 


0.05 


0.02 


2.5 


0.51 


0.43 


1.2 


NoTOlpowoi 


17 


3 


4 


0.05 


0.02 


2.5 


0.47 


0.43 


LI 


fl«i»<her 


228 


4 


184 


1.91 


0.58 


3.3 


5.98 


4.77 


1.3 


digon 


28 


5 


27 


0.31 


0.07 


4.4 


1.3 


1.05 


1.2 


No*01 


17 


6 


4 


0.09 


0.02 


4.5 


0.51 


0.44 


1.2 


north rop-ao* 


15 


16 


11 


0.66 


0.07 


9.4 


1.83 


1.07 


1.7 




70 


24 


64 


6.2 


0.35 


17.7 


.8.15 


2.32 


3.5 


coulmto -hot 


70 


24 


64 


6.2 


0.34 


18.2 


8.14 


2.32 


3.5 


coulm»n_cold 


70 


24 


64 


6.19 


0.36 


17.2 


8.12 


2.32 


3.5 


cou.lm*n-dol*y 


70 


24 


64 


6.2 


0.34 


18.2 


8.14 


2.33 


3.5 


co\Umm_hot 


70 


24 


64 


6.19 


0.35 


17.7 


8.12 


2.34 


3.5 






















eouliaut^oJd 


70 


24 


64 


6.19 


0.33 


18.8 


8.13 


2.33 


3.5 


bultmAnn 


80 


26 


24 


1.91 


0.13 


14.7 


4.03 


2.27 


1.8 


IOmnx 


6,900 


57 


4,128 


690 


18.8 


36.7 


882 


210 


4.2 



TABLE IV 

JiffyTune Results for 12-Way Priority Decode Circuit. All Delays Are Normalized to a Requirement of 500 Time Units 





MANUAL 


HOT 


COLD 


DELAY 


MINIM AX 


Path #1, falling delay 


555 


494 


488 


483 


497 


Path #1, rising delay 


471 


475 


473 


469 


510 


Path #2, falling delay 


535 


495 


495 


483 


506 


Path #2, rising delay 


494 


488 


488 


472 


524 


Path #3, falling delay 


561 


519 


517 


497 


544 


Path #3, rising delay 


497 


519 


519 


497 


527 


Path #4, falling delay 


497 


494 


491 


484 


516 


Path #4, rising delay 


462 


497 


496 


485 


476 


Area 


893 


844 


849 


1148 


800 


# JifryTune iterations 




9 


26 


16 


41 


Run time (CPU seconds) 




172 


465 


289 


716 



circuit improved by 7.5%, and the area decreased by 5.0%. The 
second JiffyTune run (COLD) started from an untuned circuit 
in which initial transistor sizes were set to the same default 
value as they would be for a "new design." Comparing the 
results of (HOT) and (COLD) shows that the poor start point 
did not significantly change the final results, but the optimizer 
had to work harder. The next JiffyTune run (DELAY) was set 
up to cause JiffyTune to reach the timing goal of 500 time 
units at all cost. JiffyTune was configured as in run (HOT), 
only with a weight on the area constraint that was a tenth 
of the previous value. The table shows that the goal was 
reached but at a high cost in transistor area. In general, we 
have found that it is important to impose an area constraint, 
since without one, JiffyTune converges to one of many equally 
fast circuits depending on the start point, with some solutions 
more compact than others. The final run used the same start 



point and weights as HOT but formulated the problem as a 
minimax optimization. A solution with a somewhat higher 
delay but lower area was obtained in this case. 

C. Circuit Optimization 

The benchmark circuits of Table II were optimized using 
the adjoint Lagrangian formulation and the new method of 
Hessian updates. However, unlike the sensitivity comparisons 
of Section VI-A, the nonadjoint Lagrangian results presented 
here included enhancements such as the multiple adjoint La- 
grangian groups used in the hybrid scheme when the multiplier 
changed at a major iteration and gradient skipping fur rejected 
steps. A comparison of results on the benchmark problems is 
given in Table V. The speedups shown in the last column of 
Table III are further eroded during the optimization procedure 
due to this hybrid scheme's being necessary for such major 



1306 



IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 17, NO. 12, DECEMBER 1998 



TABLE V 

Overall Comparison of Adjoint Lagrangian Results 





Non-Adjomt 

WllM 
(P.) 


Adjoint 
(P-) 


Fun etl©n/Or»<ii«nt/It« ration* 
(CPU mc«./ CPU mmv.t #) 


ffcetor 


Adjoint 


Adjoint 


QrttdUnt 


Tot*l 


lau2 


164 


161 


27.10/1.59/25 


52.80/2.94/50 


0.54 


0.54 


morrill 1 


0.168 


0.526 


2.83/0.07/9 


3.32/0.10/10 


0.70 


0.92 


dav3 


256 


264 


314.00/5.53/25 


409.00/11.20/34 


0.49 


0.76 


dur2 


472 


472 


91.50/1.30/17 


90.50/0.92/17 


1.41 


1.01 


wire 


15700 


13600 


13.70/1.15/33 


38.60/2.72/100 


0.42 


0.41 


Nvpw 


267 


262 


12.50/0.60/29 


42.10/1.20/100 


0.50 


0.35 


fleis 


-532 


-511 


353.00/86.70/88 


328.00/38.50/82 


2.25 


1.19 


clkgen 1 


1.73 


1.54 


5.97/0.72/5 


12.40/0.77/12 


0.94 


0.62 


NovOl 


176 


176 


15.20/1.11/35 


18.70/0.56/43 


1.98 


0.86 


nort 


-25.4 


-2.47 


43.10/8.31/47 


25.30/2.22/28 


3.74 


1.76 


co_d 


119 


122 


42.70/34.50/22 


54.60/9-48/28 


3.64 


1.19 


coJi 


270 


290 


38.20/37.30/19 


35.70/6.35/17 


5.87 


1.73 


co_c 


261 


263 


34.50/20.60/17 


50.40/8.50/25 


2.42 


0.94 


c_d_m 


73.8 


93.9 


98.20/127.00/51 


98.50/28.90/52 


4.39 


1.75 


c_h_m 


75.8 


109 


57.40/66.60/28 


57.00/19.00/27 


3.51 


1.60 


c_c_m 


72.5 


77.6 


157.00/178.00/80 


191.00/44.30/99 


4.02 


1.42 


bult 


-71.4 


-61 


48.20/9.13 /24 


25.20/3.14/13 


2.91 


1.91 


IOmux 


-14653 


-11625 


5137.86/7965.87/27 


4992.76/1504.62/27 


5.29 


2.02 



iterations in the adjoint Lagrangian scheme, in addition to 
various other overheads, such as reading and compiling files 
in the process of setting up the optimization problem, the 
optimization routines, and further tasks that are common to 
the old and new implementations. Nevertheless, some gains in 
CPU time were observed with no significant loss in the quality 
of results in almost all cases. The difference in the final result 
between adjoint Lagrangian and nonadjoint Lagrangian tuning 
is due to the Hessian's being approximated differently, result- 
ing in different paths being taken by the optimization program. 
The real benefit of the adjoint Lagrangian formulation is seen 
in larger problems, particularly problems with a large number 
of measurements. 

In all cases except the IOmux benchmark, the problems 
were run until completion. However, in the case of IOmux, the 
stopping criteria were not readily satisfied, and the runs con- 
tinued for more than double the number of iterations without 
improvement. Since this obscures the results, we report IOmux 
after 27 iterations for both adjoint and nonadjoint (when a best 
solution had already been found). Meanwhile, we are seeking 
ways to improve the stopping criteria. Some more detailed 
comments on the results of optimizing the IOmux circuit of 
Tables II and III are presented below. 1 

The IOmux circuit-tuning problem was formulated as an 
area minimization with 41 timing constraints and a relatively 
high weight on the area objective function. The area (approx- 
imated by the sum of the transistor widths) began at 31 128 
fim. The nonadjoint Lagrangian method reduced the area to 
14044.8 /zm in the course of 26 optimization iterations, but 
the sum of the infeasibilities was 18 340 equivalent ps. After 
27 iterations, the total CPU time required was 218.4 min, 



consisting of 85.6 min of transient simulation and 132.8 CPU 
min of gradient evaluation time. With the adjoint Lagrangian 
formulation, after 27 iterations, the area was 15 185.3 /xm but 
the sum of the infeasibilities was 12451 equivalent ps. 

The run time reduced to a total of 108.3 min, consisting 
of 83.2 min of transient simulation CPU time and 25.1 min 
of gradient evaluation time. Thus, the overall speedup in the 
optimization was 2.02, while the speedup in the gradient com- 
putation portion was 5.29. Clearly, the gradient computation 
bottleneck has been effectively addressed, leaving the transient 
simulation as the dominant portion of the total run time. We 
remark that the optimization routines in the adjoint Lagrangian 
run occupied a paltry 1 1 s of CPU time. The gradient speedup 
would be equal to the value reported in Table II (i.e., 36.7 
rather than 5.29) if a composite adjoint Lagrangian formulation 
were used in every iteration. However, in the hybrid scheme, 
a "group" adjoint Lagrangian formulation is used for some of 
the major iterations. Hence the hybrid scheme reduces the total 
gradient computation speedup. 

In the above example, the adjoint Lagrangian formulation 
reduced the CPU time of the circuit optimization from over 
218 min to under 108.5 min. This speedup is expected to 
improve further as the method is applied to larger circuits, 
thus rendering such optimizations feasible. 2 We have recently 
successfully tuned a circuit with 18 854 transistors, 12608 
tunable transistors, and six independent parameters in less than 
two hours. Further, the adjoint Lagrangian formulation allows 
additional constraints at a relatively low incremental cost. In 
particular, increasing the number of independent parameters 
does not affect the sensitivity computations at all. It does result 
in a larger problem for the LANCELOT optimization engine, 



1 On problems with only constraints, the optimizer was instructed to stop if 
the sum of the constraint violations was under 2 ps. Hence any value under 
2 ps means that the problem was essentially solved. 



^Indeed, our latest results produced an area of 15 669.2 /xm after 15 
iterations and 61.57 min T with the sum of infeasibilities being 11651 ps, 
and terminated on its own at iteration 23 and a further 32.55 min of run time. 



but the run time for LANCELOT is not a bottleneck. This 
feature has a significant methodology impact, particularly for 
self-timed and dynamic circuits in which the number of timing 
"checks" that have to be satisfied during tuning can be very 
large. In many cases, the independent parameters are picked 
based on ratioing and grouping that the designer chooses in 
order to make the subsequent layout task manageable and the 
circuit "balanced." Less grouping implies a larger number of 
independent parameters, more freedom for the optimizer, and 
often a better result, but more layout work since fewer cells 
can share layouts. 

VII. Conclusions and Future Work 

In this paper, we described Jiffy Tune, a program that 
optimizes circuits by adjusting transistor and wire sizes. Jiffy- 
Tune makes use of fast simulation and time-domain gradient 
computation in the circuit simulator SPECS and advanced 
nonlinear numerical techniques in the optimization package 
LANCELOT. Delay, rise/fall time, area, and power opti- 
mization have been implemented. The optimization system 
is flexible and allows ratioing of transistors and grouping 
of identical instances. An intuitive interface including back 
annotation of optimization results onto the schematic has been 
developed. 

The environment in which a circuit will be used and the 
required performance are estimated long before the chip is 
built. By the time the circuit is integrated onto the chip, it 
may no longer be optimally tuned, much to the frustration 
of the design engineer. Changes in loading, changes in the 
specifications, changes in parasitics after extraction, changes in 
technology device models, and remapping to a new technology 
are common occurrences during the course of a project. In such 
situations, retuning at the push of a button without tedious 
respecification is extremely useful. 

With the proposed adjoint Lagrangian formulation for the 
computation of circuit gradients, for the purposes of optimiza- 
tion, the gradients of a merit function can be computed in a 
single adjoint analysis, irrespective of the number of param- 
eters or the number of measurements. Speedups of over 30 x 
were demonstrated in the gradient computation procedure, thus 
addressing the bottleneck in circuit optimization programs. 
This approach to gradient computation has resulted in circuits 
with up to 6900 transistors' being successfully tuned in about 
108 min of CPU time. The largest circuit tuned contains 18 854 
transistors. The low incremental cost of additional constraints 
makes the optimizer amenable to tuning dynamic circuits, 
which typically have a large number of timing constraints. 
Improved methods for performing Hessian updates and better 
stopping criteria are currently being investigated to enhance 
the efficiency of the circuit optimization. 

Jiffy Tune has been successfully used to tune a number of 
circuits on the critical paths of a high-performance micro- 
processor chip, which makes liberal use of dynamic logic. 
It has been particularly useful in tuning tricky pass-gate 
circuits and has been found to enhance design reuse. Further, 
since the optimization process has been made relatively easy 
and automatic for the designer, a paradigm shift has been 



observed; the issue becomes how to correctly specify the 
optimization problem rather than solving the optimization 
problem itself. There are a number of avenues for future work. 
Repeated solution runs of the sensitivity or adjoint circuit are 
independent and therefore amenable to parallel processing. 
Extension to semiinfinite constraints (see, for example, [42] 
and [15]) would allow optimization of circuits while taking 
into account environmental variations such as temperature 
and power-supply voltage. Reformulating the problem to take 
advantage of group partial separability in LANCELOT [21], 
[43] would speed up the optimization. In addition, applications 
to IC manufacturability are being considered. 

We note that JiffyTune in its present form is not directly 
applicable to designs in which gates are chosen from a fixed 
library of cells with a finite set of discrete power levels. 
On the other hand, JiffyTune performs well on hierarchical 
schematics with leaf cells containing any mix of transistors 
and continuously parameterized gates. In practice, JiffyTune 
handles circuits containing pass transistors well, in contrast 
to optimizers based on static timing analysis, since SPECS 
yields electrically true sensitivities, taking into account details 
of the device model such as body effect. JiffyTune makes it 
possible to speed up the design process, make more refined 
designs, and provide better information about performance 
tradeoffs. 
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